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ABSTRACT 

Semi-custom  design  of  high-performance  VLSI  processors  has  been  demonstrated  by  the  Berkeley 
VLSI-PLM  chip  using  Mentor  Graphics  IDEA  station.  Cell  station  tools  and  NCR  tools.  To  support  semi¬ 
custom  design  using  Berkeley  VLSI  tools  such  as  LagerlV,  we  have  developed  a  set  of  cells.  These  cells 
are  designed  with  the  goal  of  designing  a  high-performance  VLSI  Parallel  Prolog  Processor.  They  can  be 
used  in  other  designs  such  as  DSP  chips.  Some  of  these  cells  complement  those  in  LagerlV ’s  DPP  cell 
library.  Others  provide  an  area  efficient  replacement  for  the  DPP  cells. 
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1.  Introduction 

The  LageiTV  [Bro88]  silicon  assembly  system  supports  semi-custom  design  of  microprocessors 
using  standard  cells  and  macro  cells.  Semi-custom  design  eliminates  many  of  the  problems  at  the  physical 
layout  level  at  the  expense  of  area  and  performance.  Since  the  use  of  standard  cells  reduces  the  perfor¬ 
mance  of  processors  and  uses  up  larger  die  area,  other  alternatives  are  needed  in  order  to  satisfy  the  high- 
performance  requirements  of  processors.  This  has  motivated  the  development  of  larger  macro  cells.  Stan¬ 
dard  macro  cells  allow  a  rapid  semi-custom  design,  reduce  chip  area,  and  improve  the  speed  of  the  proces¬ 
sor.  In  deigning  a  complex  systems  such  as  the  Parallel  Prolog  Processor  [DeS88],  macro  cells  provide  the 
designer  with  a  basis  for  a  rapid  design  without  affecting  the  area  and  performance  of  the  chip.  In  this 
report  we  describe  the  design  of  area  efficient  cells  to  design  a  high-performance  processor  such  as  the  Pro¬ 
log  processor. 

The  Parallel  Prolog  Processor  (PPP)  [DeS88]  is  a  high-performance  processor  that  will  support  sym¬ 
bolic  languages  such  as  Prolog  and  LISP.  Due  to  the  complexity  of  the  design  of  such  a  processor,  espe¬ 
cially  when  high-performance  is  an  issue,  a  semi-custom  VLSI  design  is  chosen.  Development  is  faster  in  a 
semi-custom  design  approach  since  we  avoid  many  of  the  physical  level  design  details  inside  the  cells. 
Since  the  cells  available  in  standard  libraries  did  not  meet  our  specific  demands,  we  decided  to  add  area 
efficient  cells  to  an  existing  library,  the  LagerFV  Datapath  library  [Bro88]  (also  referred  to  as  the  DPP 
library).  The  critical  component  in  the  datapath  of  the  processor  is  the  ALU.  We  selected  the  ALU  as  a 
basis  for  building  a  complete  library  of  cells.  Section  2  contains  the  first  part  of  the  design  from  logic  gates 
to  a  layout.  Section  3  describes  the  geometry  and  the  rules  of  the  layout  design.  A  comparison  of  the 
newly  created  cells  with  their  dpp  equivalent  is  shown  in  section  4. 

2.  Description  of  Cells 

The  layout  design  followed  the  logic  design  of  the  arithmetic  logic  unit  based  on  a  carry  bypass 
scheme.  The  design  and  simulation  of  the  ALU  at  the  gate  level  were  done  using  Mentor  Graphics’  tools 
on  Apollo  workstations.  Under  best-case  conditions,  die  ALU  did  a  32-bit  add  in  35ns;  under  worst-case 
conditions  it  did  tire  addition  in  50ns.  These  figures  resulted  from  using  NCR’s  1.5  micron  standard  cell 
library.  We  needed  a  40%  improvement  in  speed  using  the  proposed  extented  cell  library. 
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About  twelve  logic  gates  were  used  in  the  design  of  the  ALU.  Most  of  these  are  basic  gates  such  as 
2  input  AND,  OR,  XOR  gates  and  2  input  multiplexer.  Additional  cells  were  Deeded  in  other  parts  of  the 
datapath.  In  all,  twenty-three  cells  were  added  to  the  Lager  DPP  library.  Some  of  the  cells  are  tri-stated 
demultiplexers  with  3,  4,  and  5  inputs,  multiplexers  with  3,  4,  5,  and  6  inputs.  A  buffer/inverter  and  a  D 
flip-flop  with  high  fan-outs  were  also  designed  for  other  parts  of  the  datapath  of  the  PPP. 

3.  Geometry  of  Cells 

Since  the  new  cells  are  an  extension  of  the  LagerlV  DPP  cells,  the  DPP  layout  conventions  and  stan¬ 
dards  were  followed  in  order  to  keep  things  compatible.  First,  Magic  was  the  main  graphics  editor  used  for 
the  development  of  the  layout  Second,  every  cell  was  converted  to  its  physical  instance  in  the  OCT 
environment  because  the  router  we  were  using  was  compatible  with  the  OCT  tools  [Spi88].  Since  Magic 
is  a  physical  layout  editor,  one  could  not  rely  on  a  compactor  to  give  the  optimum  area  for  a  design.  So, 
manual  design  was  needed.  In  addition  to  the  design  rules  and  electrical  functionality  of  the  circuit,  the 
height  had  to  be  approximately  fifty  lambda  units.  Each  cell  also  has  to  be  split  into  two  regions:  an  N- 
Well  region  and  a  P-Well  region  separated  by  control,  power,  and  ground  lines  crossing  the  cells  vertically 
in  metall.  Data  lines  were  to  cross  the  cells  horizontally  in  metal2  or  polysilicon.  The  polysilicon  lines 
were  used  for  data  lines  only  when  these  lines  did  not  exceed  100  lambda  units  in  length,  due  to  its  higher 
resistivity  compared  to  metal.  Horizontal  feed  lines  in  metal2  were  added  whenever  possible,  allowing  for 
bus  lines  to  travel  across  the  cells  without  complicating  the  routing.  Figure  1  shows  the  general  layout 
geometry.  The  approach  to  the  layout  was  to  draw  the  data  buses  horizontally  and  control  lines  vertically. 
After  defining  the  outer  geometry,  transistors  were  layed  out  taking  advantage  of  any  shared  contact  or 
connection  for  the  sole  purpose  of  saving  area.  After  the  necessary  circuitry  was  laid  out,  the  feed  lines 
were  added  wherever  there  was  enough  space  to  do  so.  One  of  the  biggest  challenges  was  die  routing  of 
some  control  signals  and  data  lines  inside  die  cells.  Even  though  planning  and  systematic  designs  reduce 
the  likelihood  of  such  problems,  internal  routing  became  the  main  difficulty  in  keeping  the  cell’s  area  size 
as  it  was  planned.  The  solution  to  this  frequent  inconvenience  was  to  use  poly,  metall ,  and  metal2  inter¬ 
connections  as  long  as  it  did  not  violate  the  neighboring  and  underlying  materials.  The  circuit  diagram  of 
the  cells  are  in  Figures  2-21,  and  the  layouts  are  in  Plots  2-21.  The  transistor  sizes  are  in  lambda  units. 
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4.  Comparison  with  DPP  library 

Table  1  contains  the  size  of  the  every  cell  in  terms  of  lambda  squared.  This  is  compared  to  the  size  of 
the  equivalent  cell  built  using  the  cells  from  the  DPP  library.  The  equivalent  cells,  however,  do  not 
represent  the  final  area  si  nr*  the  area  used  for  routing  interconnection  was  not  included.  Thus,  the  percen¬ 
tage  decrease  in  area  is  a  raw  figure,  and  eventually  the  decrease  would  be  bigger.  The  cells  buf4  and  buf5 
used  up  more  area  than  their  DPP  equivalent  precisely  because  the  area  used  for  interconnecting  the  DPP 
rcitc  were  not  considered.  Routing  used  up  a  high  percentage  of  file  area  due  to  obstacles  between  input 


and  output  lines. 


Table  1 


Comparison  of  new  and  old  cells 


New  Cell 

Area 

dpp  Cell 

dpp  Cell  Area 

dpp  Total  Area 

%  Decrease 

2  inputs  AND 

1092 

ANDNOR 

4492 

4492 

75.69% 

2  inputs  NAND 

1092 

NANDNOR 

3136 

3136 

65.17% 

2  inputs  OR 

1092 

NA 

NA 

NA 

- 

2  inputs  NOR 

1092 

ANDNOR 

4492 

4492 

75.69% 

2  inputs  XOR 

2940 

XORNOR 

NA 

NA 

- 

2  inputs  XNOR 

2698 

XOR+INV 

NA 

NA 

- 

4  inputs  AND 

1558 

3(  ANDNOR) 

4492 

13476 

88.43% 

4  inputs  NOR 

1520 

3(  ANDNOR) 

4492 

13476 

88.72% 

3  inputs  NAND 

1184 

2(N  ANDNOR) 

3136 

6272 

81.12% 

4  inputs  NAND 

1558 

3(NANDNOR) 

3136 

9408 

83.43% 

5  inputs  NAND 

1911 

4(N ANDNOR) 

3136 

12544 

84.76% 

6  inputs  NAND 

2233 

5(NANDNOR) 

3136 

15680 

85.75% 

3  inputs  Buffer 

5311 

3(1  input  Buffer) 

1900 

5700 

6.82% 

4  inputs  Buffer 

8208 

4(  1  input  Buffer) 

1900 

7600 

-8.00% 

5  inputs  Buffer 

9792 

5(1  input  Buffer) 

1900 

9500 

-3.07% 

3  to  1  MUX 

5082 

2(2  to  1  MUX) 

3450 

6900 

26.34% 

4  to  1  MUX 

6930 

3(2  to  1  MUX) 

3450 

10350 

33.04% 

5  to  1  MUX 

10626 

4(2  to  1  MUX) 

3450 

13800 

23.00% 

6  to  1  MUX 

13622 

5(2  to  1  MUX) 

3450 

17250 

21.03% 

AB  +  CD 

2747 

3(NANDNOR) 

3136 

9408 

70.80% 

SUM  CELL 

5967 

NA 

NA 

NA 

32  Lines  Driver 

6380 

NA 

NA 

NA 

DFF 

6897 

NA 

NA 

NA 

* 

NA:  Not  Available. 


Areas  in  lambda  squared. 
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5.  Conclusion 

Since  fully-custom  design  is  time  consuming  and  hard  to  debug,  semi-custom  design  has  emerged  as 
an  alternative  for  fast  layout  generation  and  simulation.  However,  the  variety  of  cells  that  exists  in  certain 
libraries  may  not  be  enough  for  seme  designs.  Thus,  the  need  to  develop  an  extended  library  is  justified. 
We  have  developed  twenty-three  cells  that  follow  the  LagerlV  DPP  library  guidelines  and  meet  our  needs 
for  the  layout  generation  of  the  arithmetic  unit  of  the  Parallel  Prolog  Processor.  These  new  cells  will  allow 
us  to  rapidly  generate  a  layout  for  the  PPP  while  giving  us  a  much  better  area  than  the  layout  using  the  reg¬ 
ular  DPP  cells.  In  addition,  the  performance  of  the  processor  will  be  enhanced  because  of  the  use  of  these 
cells. 
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DATA  or  FEED  LINES 


*  Polysilicon  lines  are  for  data  lines  only  when  not  exceeding 
100  lambdas  in  length.  In  the  latter  case  Metal2  is  used  for 
data  lines. 
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